{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction to DataFrames\n",
    "**[Bogumił Kamiński](http://bogumilkaminski.pl/about/), November 15, 2020**\n",
    "\n",
    "Let's get started by loading the `DataFrames` package."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "using DataFrames, Random"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Constructors and conversion"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Constructors\n",
    "\n",
    "In this section, you'll see many ways to create a `DataFrame` using the `DataFrame()` constructor.\n",
    "\n",
    "First, we could create an empty DataFrame,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th></tr><tr><th></th></tr></thead><tbody><p>0 rows × 0 columns</p></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|}\n",
       "\t& \\\\\n",
       "\t\\hline\n",
       "\t& \\\\\n",
       "\t\\hline\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m0×0 DataFrame\u001b[0m"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "DataFrame()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or we could call the constructor using keyword arguments to add columns to the `DataFrame`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>A</th><th>B</th><th>C</th><th>fixed</th></tr><tr><th></th><th>Int64</th><th>Float64</th><th>String</th><th>Int64</th></tr></thead><tbody><p>3 rows × 4 columns</p><tr><th>1</th><td>1</td><td>0.047785</td><td>hAj</td><td>1</td></tr><tr><th>2</th><td>2</td><td>0.785473</td><td>rLw</td><td>1</td></tr><tr><th>3</th><td>3</td><td>0.707365</td><td>3AL</td><td>1</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cccc}\n",
       "\t& A & B & C & fixed\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Float64 & String & Int64\\\\\n",
       "\t\\hline\n",
       "\t1 & 1 & 0.047785 & hAj & 1 \\\\\n",
       "\t2 & 2 & 0.785473 & rLw & 1 \\\\\n",
       "\t3 & 3 & 0.707365 & 3AL & 1 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m3×4 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m A     \u001b[0m\u001b[1m B        \u001b[0m\u001b[1m C      \u001b[0m\u001b[1m fixed \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Float64  \u001b[0m\u001b[90m String \u001b[0m\u001b[90m Int64 \u001b[0m\n",
       "─────┼────────────────────────────────\n",
       "   1 │     1  0.047785  hAj         1\n",
       "   2 │     2  0.785473  rLw         1\n",
       "   3 │     3  0.707365  3AL         1"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "DataFrame(A=1:3, B=rand(3), C=randstring.([3,3,3]), fixed=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "note in column `:fixed` that scalars get automatically broadcasted."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can create a `DataFrame` from a dictionary, in which case keys from the dictionary will be sorted to create the `DataFrame` columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>A</th><th>B</th><th>C</th><th>fixed</th></tr><tr><th></th><th>Int64</th><th>Bool</th><th>Char</th><th>Array…</th></tr></thead><tbody><p>2 rows × 4 columns</p><tr><th>1</th><td>1</td><td>1</td><td>a</td><td>[1, 1]</td></tr><tr><th>2</th><td>2</td><td>0</td><td>b</td><td>[1, 1]</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cccc}\n",
       "\t& A & B & C & fixed\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Bool & Char & Array…\\\\\n",
       "\t\\hline\n",
       "\t1 & 1 & 1 & a & [1, 1] \\\\\n",
       "\t2 & 2 & 0 & b & [1, 1] \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m2×4 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m A     \u001b[0m\u001b[1m B     \u001b[0m\u001b[1m C    \u001b[0m\u001b[1m fixed  \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Bool  \u001b[0m\u001b[90m Char \u001b[0m\u001b[90m Array… \u001b[0m\n",
       "─────┼────────────────────────────\n",
       "   1 │     1   true  a     [1, 1]\n",
       "   2 │     2  false  b     [1, 1]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = Dict(\"A\" => [1,2], \"B\" => [true, false], \"C\" => ['a', 'b'], \"fixed\" => Ref([1,1]))\n",
    "DataFrame(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This time we used `Ref` to protect a vector from being treated as a column and forcing broadcasting it into every row of `:fixed` column (note that the `[1,1]` vector is aliased in each row)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Rather than explicitly creating a dictionary first, as above, we could pass `DataFrame` arguments with the syntax of dictionary key-value pairs. \n",
    "\n",
    "Note that in this case, we use `Symbol`s to denote the column names and arguments are not sorted. For example, `:A`, the symbol, produces `A`, the name of the first column here:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>A</th><th>B</th><th>C</th></tr><tr><th></th><th>Int64</th><th>Bool</th><th>Char</th></tr></thead><tbody><p>2 rows × 3 columns</p><tr><th>1</th><td>1</td><td>1</td><td>a</td></tr><tr><th>2</th><td>2</td><td>0</td><td>b</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccc}\n",
       "\t& A & B & C\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Bool & Char\\\\\n",
       "\t\\hline\n",
       "\t1 & 1 & 1 & a \\\\\n",
       "\t2 & 2 & 0 & b \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m2×3 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m A     \u001b[0m\u001b[1m B     \u001b[0m\u001b[1m C    \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Bool  \u001b[0m\u001b[90m Char \u001b[0m\n",
       "─────┼────────────────────\n",
       "   1 │     1   true  a\n",
       "   2 │     2  false  b"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "DataFrame(:A => [1,2], :B => [true, false], :C => ['a', 'b'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Although, in general, using `Symbol`s rather than strings to denote column names is preferred (as it is faster) DataFrames.jl accepts passing strings as column names, so this also works:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>A</th><th>B</th><th>C</th></tr><tr><th></th><th>Int64</th><th>Bool</th><th>Char</th></tr></thead><tbody><p>2 rows × 3 columns</p><tr><th>1</th><td>1</td><td>1</td><td>a</td></tr><tr><th>2</th><td>2</td><td>0</td><td>b</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccc}\n",
       "\t& A & B & C\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Bool & Char\\\\\n",
       "\t\\hline\n",
       "\t1 & 1 & 1 & a \\\\\n",
       "\t2 & 2 & 0 & b \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m2×3 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m A     \u001b[0m\u001b[1m B     \u001b[0m\u001b[1m C    \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Bool  \u001b[0m\u001b[90m Char \u001b[0m\n",
       "─────┼────────────────────\n",
       "   1 │     1   true  a\n",
       "   2 │     2  false  b"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "DataFrame(\"A\" => [1,2], \"B\" => [true, false], \"C\" => ['a', 'b'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also pass a vector of pairs, which is useful if it is constructed programatically:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>A</th><th>B</th><th>C</th><th>fixed</th></tr><tr><th></th><th>Int64</th><th>Bool</th><th>Char</th><th>String</th></tr></thead><tbody><p>2 rows × 4 columns</p><tr><th>1</th><td>1</td><td>1</td><td>a</td><td>const</td></tr><tr><th>2</th><td>2</td><td>0</td><td>b</td><td>const</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cccc}\n",
       "\t& A & B & C & fixed\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Bool & Char & String\\\\\n",
       "\t\\hline\n",
       "\t1 & 1 & 1 & a & const \\\\\n",
       "\t2 & 2 & 0 & b & const \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m2×4 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m A     \u001b[0m\u001b[1m B     \u001b[0m\u001b[1m C    \u001b[0m\u001b[1m fixed  \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Bool  \u001b[0m\u001b[90m Char \u001b[0m\u001b[90m String \u001b[0m\n",
       "─────┼────────────────────────────\n",
       "   1 │     1   true  a     const\n",
       "   2 │     2  false  b     const"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "DataFrame([:A => [1,2], :B => [true, false], :C => ['a', 'b'], :fixed => \"const\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we create a `DataFrame` from a vector of vectors, and each vector becomes a column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>x1</th><th>x2</th><th>x3</th></tr><tr><th></th><th>Float64</th><th>Float64</th><th>Float64</th></tr></thead><tbody><p>3 rows × 3 columns</p><tr><th>1</th><td>0.875991</td><td>0.185098</td><td>0.173807</td></tr><tr><th>2</th><td>0.0933727</td><td>0.0598802</td><td>0.290761</td></tr><tr><th>3</th><td>0.974681</td><td>0.24215</td><td>0.525832</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccc}\n",
       "\t& x1 & x2 & x3\\\\\n",
       "\t\\hline\n",
       "\t& Float64 & Float64 & Float64\\\\\n",
       "\t\\hline\n",
       "\t1 & 0.875991 & 0.185098 & 0.173807 \\\\\n",
       "\t2 & 0.0933727 & 0.0598802 & 0.290761 \\\\\n",
       "\t3 & 0.974681 & 0.24215 & 0.525832 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m3×3 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m x1        \u001b[0m\u001b[1m x2        \u001b[0m\u001b[1m x3       \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Float64   \u001b[0m\u001b[90m Float64   \u001b[0m\u001b[90m Float64  \u001b[0m\n",
       "─────┼────────────────────────────────\n",
       "   1 │ 0.875991   0.185098   0.173807\n",
       "   2 │ 0.0933727  0.0598802  0.290761\n",
       "   3 │ 0.974681   0.24215    0.525832"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "DataFrame([rand(3) for i in 1:3], :auto)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>x1</th><th>x2</th><th>x3</th></tr><tr><th></th><th>Float64</th><th>Float64</th><th>Float64</th></tr></thead><tbody><p>3 rows × 3 columns</p><tr><th>1</th><td>0.4318</td><td>0.315872</td><td>0.0721239</td></tr><tr><th>2</th><td>0.287694</td><td>0.455928</td><td>0.521293</td></tr><tr><th>3</th><td>0.742307</td><td>0.940576</td><td>0.467023</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccc}\n",
       "\t& x1 & x2 & x3\\\\\n",
       "\t\\hline\n",
       "\t& Float64 & Float64 & Float64\\\\\n",
       "\t\\hline\n",
       "\t1 & 0.4318 & 0.315872 & 0.0721239 \\\\\n",
       "\t2 & 0.287694 & 0.455928 & 0.521293 \\\\\n",
       "\t3 & 0.742307 & 0.940576 & 0.467023 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m3×3 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m x1       \u001b[0m\u001b[1m x2       \u001b[0m\u001b[1m x3        \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Float64  \u001b[0m\u001b[90m Float64  \u001b[0m\u001b[90m Float64   \u001b[0m\n",
       "─────┼───────────────────────────────\n",
       "   1 │ 0.4318    0.315872  0.0721239\n",
       "   2 │ 0.287694  0.455928  0.521293\n",
       "   3 │ 0.742307  0.940576  0.467023"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "DataFrame([rand(3) for i in 1:3], [:x1, :x2, :x3])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>x1</th><th>x2</th><th>x3</th></tr><tr><th></th><th>Float64</th><th>Float64</th><th>Float64</th></tr></thead><tbody><p>3 rows × 3 columns</p><tr><th>1</th><td>0.238806</td><td>0.638928</td><td>0.963401</td></tr><tr><th>2</th><td>0.397363</td><td>0.509695</td><td>0.895814</td></tr><tr><th>3</th><td>0.0809898</td><td>0.577027</td><td>0.640533</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccc}\n",
       "\t& x1 & x2 & x3\\\\\n",
       "\t\\hline\n",
       "\t& Float64 & Float64 & Float64\\\\\n",
       "\t\\hline\n",
       "\t1 & 0.238806 & 0.638928 & 0.963401 \\\\\n",
       "\t2 & 0.397363 & 0.509695 & 0.895814 \\\\\n",
       "\t3 & 0.0809898 & 0.577027 & 0.640533 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m3×3 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m x1        \u001b[0m\u001b[1m x2       \u001b[0m\u001b[1m x3       \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Float64   \u001b[0m\u001b[90m Float64  \u001b[0m\u001b[90m Float64  \u001b[0m\n",
       "─────┼───────────────────────────────\n",
       "   1 │ 0.238806   0.638928  0.963401\n",
       "   2 │ 0.397363   0.509695  0.895814\n",
       "   3 │ 0.0809898  0.577027  0.640533"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "DataFrame([rand(3) for i in 1:3], [\"x1\", \"x2\", \"x3\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see you either pass a vector of column names as a second argument or `:auto` in which case column names are generated automatically."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In particular it is not allowed to pass a vector of scalars to `DataFrame` constructor."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "ename": "LoadError",
     "evalue": "ArgumentError: 'Array{Int64,1}' iterates 'Int64' values, which doesn't satisfy the Tables.jl `AbstractRow` interface",
     "output_type": "error",
     "traceback": [
      "ArgumentError: 'Array{Int64,1}' iterates 'Int64' values, which doesn't satisfy the Tables.jl `AbstractRow` interface",
      "",
      "Stacktrace:",
      " [1] invalidtable(::Array{Int64,1}, ::Int64) at D:\\AppData\\.julia\\packages\\Tables\\iG2a3\\src\\tofromdatavalues.jl:42",
      " [2] iterate at D:\\AppData\\.julia\\packages\\Tables\\iG2a3\\src\\tofromdatavalues.jl:48 [inlined]",
      " [3] buildcolumns at D:\\AppData\\.julia\\packages\\Tables\\iG2a3\\src\\fallbacks.jl:199 [inlined]",
      " [4] columns at D:\\AppData\\.julia\\packages\\Tables\\iG2a3\\src\\fallbacks.jl:262 [inlined]",
      " [5] DataFrame(::Array{Int64,1}; copycols::Bool) at D:\\AppData\\.julia\\packages\\DataFrames\\X0xNW\\src\\other\\tables.jl:55",
      " [6] DataFrame(::Array{Int64,1}) at D:\\AppData\\.julia\\packages\\DataFrames\\X0xNW\\src\\other\\tables.jl:46",
      " [7] top-level scope at In[11]:1",
      " [8] include_string(::Function, ::Module, ::String, ::String) at .\\loading.jl:1091"
     ]
    }
   ],
   "source": [
    "DataFrame([1, 2, 3])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Instead use a transposed vector if you have a vector of single values (in this way you effectively pass a two dimensional array to the constructor which is supported the same way as in vector of vectors case)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>x1</th><th>x2</th><th>x3</th></tr><tr><th></th><th>Int64</th><th>Int64</th><th>Int64</th></tr></thead><tbody><p>1 rows × 3 columns</p><tr><th>1</th><td>1</td><td>2</td><td>3</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccc}\n",
       "\t& x1 & x2 & x3\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Int64 & Int64\\\\\n",
       "\t\\hline\n",
       "\t1 & 1 & 2 & 3 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m1×3 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m x1    \u001b[0m\u001b[1m x2    \u001b[0m\u001b[1m x3    \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\n",
       "─────┼─────────────────────\n",
       "   1 │     1      2      3"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "DataFrame(permutedims([1, 2, 3]), :auto)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also pass a vector of `NamedTuple`s to construct a `DataFrame`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>a</th><th>b</th></tr><tr><th></th><th>Int64</th><th>Int64</th></tr></thead><tbody><p>2 rows × 2 columns</p><tr><th>1</th><td>1</td><td>2</td></tr><tr><th>2</th><td>3</td><td>4</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cc}\n",
       "\t& a & b\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Int64\\\\\n",
       "\t\\hline\n",
       "\t1 & 1 & 2 \\\\\n",
       "\t2 & 3 & 4 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m2×2 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m a     \u001b[0m\u001b[1m b     \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\n",
       "─────┼──────────────\n",
       "   1 │     1      2\n",
       "   2 │     3      4"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "v = [(a=1, b=2), (a=3, b=4)]\n",
    "DataFrame(v)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively you can pass a `NamedTuple` of vectors:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>a</th><th>b</th></tr><tr><th></th><th>Int64</th><th>Int64</th></tr></thead><tbody><p>3 rows × 2 columns</p><tr><th>1</th><td>1</td><td>11</td></tr><tr><th>2</th><td>2</td><td>12</td></tr><tr><th>3</th><td>3</td><td>13</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cc}\n",
       "\t& a & b\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Int64\\\\\n",
       "\t\\hline\n",
       "\t1 & 1 & 11 \\\\\n",
       "\t2 & 2 & 12 \\\\\n",
       "\t3 & 3 & 13 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m3×2 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m a     \u001b[0m\u001b[1m b     \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\n",
       "─────┼──────────────\n",
       "   1 │     1     11\n",
       "   2 │     2     12\n",
       "   3 │     3     13"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "n = (a=1:3, b=11:13)\n",
    "DataFrame(n)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we create a `DataFrame` from a matrix,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>x1</th><th>x2</th><th>x3</th><th>x4</th></tr><tr><th></th><th>Float64</th><th>Float64</th><th>Float64</th><th>Float64</th></tr></thead><tbody><p>3 rows × 4 columns</p><tr><th>1</th><td>0.238832</td><td>0.633544</td><td>0.164044</td><td>0.870503</td></tr><tr><th>2</th><td>0.771345</td><td>0.0463644</td><td>0.393462</td><td>0.727146</td></tr><tr><th>3</th><td>0.599902</td><td>0.638985</td><td>0.202234</td><td>0.153096</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cccc}\n",
       "\t& x1 & x2 & x3 & x4\\\\\n",
       "\t\\hline\n",
       "\t& Float64 & Float64 & Float64 & Float64\\\\\n",
       "\t\\hline\n",
       "\t1 & 0.238832 & 0.633544 & 0.164044 & 0.870503 \\\\\n",
       "\t2 & 0.771345 & 0.0463644 & 0.393462 & 0.727146 \\\\\n",
       "\t3 & 0.599902 & 0.638985 & 0.202234 & 0.153096 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m3×4 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m x1       \u001b[0m\u001b[1m x2        \u001b[0m\u001b[1m x3       \u001b[0m\u001b[1m x4       \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Float64  \u001b[0m\u001b[90m Float64   \u001b[0m\u001b[90m Float64  \u001b[0m\u001b[90m Float64  \u001b[0m\n",
       "─────┼─────────────────────────────────────────\n",
       "   1 │ 0.238832  0.633544   0.164044  0.870503\n",
       "   2 │ 0.771345  0.0463644  0.393462  0.727146\n",
       "   3 │ 0.599902  0.638985   0.202234  0.153096"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "DataFrame(rand(3,4), :auto)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "and here we do the same but also pass column names."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>a</th><th>b</th><th>c</th><th>d</th></tr><tr><th></th><th>Float64</th><th>Float64</th><th>Float64</th><th>Float64</th></tr></thead><tbody><p>3 rows × 4 columns</p><tr><th>1</th><td>0.0595349</td><td>0.628032</td><td>0.0752647</td><td>0.0960774</td></tr><tr><th>2</th><td>0.45101</td><td>0.441358</td><td>0.635099</td><td>0.106623</td></tr><tr><th>3</th><td>0.599683</td><td>0.353986</td><td>0.669113</td><td>0.322191</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cccc}\n",
       "\t& a & b & c & d\\\\\n",
       "\t\\hline\n",
       "\t& Float64 & Float64 & Float64 & Float64\\\\\n",
       "\t\\hline\n",
       "\t1 & 0.0595349 & 0.628032 & 0.0752647 & 0.0960774 \\\\\n",
       "\t2 & 0.45101 & 0.441358 & 0.635099 & 0.106623 \\\\\n",
       "\t3 & 0.599683 & 0.353986 & 0.669113 & 0.322191 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m3×4 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m a         \u001b[0m\u001b[1m b        \u001b[0m\u001b[1m c         \u001b[0m\u001b[1m d         \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Float64   \u001b[0m\u001b[90m Float64  \u001b[0m\u001b[90m Float64   \u001b[0m\u001b[90m Float64   \u001b[0m\n",
       "─────┼───────────────────────────────────────────\n",
       "   1 │ 0.0595349  0.628032  0.0752647  0.0960774\n",
       "   2 │ 0.45101    0.441358  0.635099   0.106623\n",
       "   3 │ 0.599683   0.353986  0.669113   0.322191"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "DataFrame(rand(3,4), Symbol.('a':'d'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "or"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>a</th><th>b</th><th>c</th><th>d</th></tr><tr><th></th><th>Float64</th><th>Float64</th><th>Float64</th><th>Float64</th></tr></thead><tbody><p>3 rows × 4 columns</p><tr><th>1</th><td>0.747615</td><td>0.161797</td><td>0.307131</td><td>0.139517</td></tr><tr><th>2</th><td>0.122387</td><td>0.213337</td><td>0.200053</td><td>0.581051</td></tr><tr><th>3</th><td>0.900947</td><td>0.904813</td><td>0.0886423</td><td>0.708975</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cccc}\n",
       "\t& a & b & c & d\\\\\n",
       "\t\\hline\n",
       "\t& Float64 & Float64 & Float64 & Float64\\\\\n",
       "\t\\hline\n",
       "\t1 & 0.747615 & 0.161797 & 0.307131 & 0.139517 \\\\\n",
       "\t2 & 0.122387 & 0.213337 & 0.200053 & 0.581051 \\\\\n",
       "\t3 & 0.900947 & 0.904813 & 0.0886423 & 0.708975 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m3×4 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m a        \u001b[0m\u001b[1m b        \u001b[0m\u001b[1m c         \u001b[0m\u001b[1m d        \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Float64  \u001b[0m\u001b[90m Float64  \u001b[0m\u001b[90m Float64   \u001b[0m\u001b[90m Float64  \u001b[0m\n",
       "─────┼─────────────────────────────────────────\n",
       "   1 │ 0.747615  0.161797  0.307131   0.139517\n",
       "   2 │ 0.122387  0.213337  0.200053   0.581051\n",
       "   3 │ 0.900947  0.904813  0.0886423  0.708975"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "DataFrame(rand(3,4), string.('a':'d'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is how you can create a data frame with no rows, but with predefined columns and their types:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>A</th><th>B</th><th>C</th></tr><tr><th></th><th>Int64</th><th>Float64</th><th>String</th></tr></thead><tbody><p>0 rows × 3 columns</p></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccc}\n",
       "\t& A & B & C\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Float64 & String\\\\\n",
       "\t\\hline\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m0×3 DataFrame\u001b[0m"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "DataFrame(A=Int[], B=Float64[], C=String[])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we can create a `DataFrame` by copying an existing `DataFrame`.\n",
    "\n",
    "Note that `copy` also copies the vectors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(false, true, true, false)"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = DataFrame(a=1:2, b='a':'b')\n",
    "y = copy(x)\n",
    "(x === y), isequal(x, y), (x.a == y.a), (x.a === y.a)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Calling `DataFrame` on a `DataFrame` object works like `copy`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(false, true, true, false)"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = DataFrame(a=1:2, b='a':'b')\n",
    "y = DataFrame(x)\n",
    "(x === y), isequal(x, y), (x.a == y.a), (x.a === y.a)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can avoid copying of columns of a data frame (if it is possible) by passing `copycols=false` keyword argument:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(false, true, true, true)"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = DataFrame(a=1:2, b='a':'b')\n",
    "y = DataFrame(x, copycols=false)\n",
    "(x === y), isequal(x, y), (x.a == y.a), (x.a === y.a)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The same rule applies to other constructors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(false, true)"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a = [1, 2, 3]\n",
    "df1 = DataFrame(a=a)\n",
    "df2 = DataFrame(a=a, copycols=false)\n",
    "df1.a === a, df2.a === a"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can create a similar uninitialized `DataFrame` based on an original one:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>a</th><th>b</th></tr><tr><th></th><th>Int64</th><th>Float64</th></tr></thead><tbody><p>1 rows × 2 columns</p><tr><th>1</th><td>1</td><td>1.0</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cc}\n",
       "\t& a & b\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Float64\\\\\n",
       "\t\\hline\n",
       "\t1 & 1 & 1.0 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m1×2 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m a     \u001b[0m\u001b[1m b       \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\n",
       "─────┼────────────────\n",
       "   1 │     1      1.0"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = DataFrame(a=1, b=1.0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>a</th><th>b</th></tr><tr><th></th><th>Int64</th><th>Float64</th></tr></thead><tbody><p>1 rows × 2 columns</p><tr><th>1</th><td>397673136</td><td>1.96478e-315</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cc}\n",
       "\t& a & b\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Float64\\\\\n",
       "\t\\hline\n",
       "\t1 & 397673136 & 1.96478e-315 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m1×2 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m a         \u001b[0m\u001b[1m b            \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64     \u001b[0m\u001b[90m Float64      \u001b[0m\n",
       "─────┼─────────────────────────\n",
       "   1 │ 397673136  1.96478e-315"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "similar(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "number of rows in a new DataFrame can be passed as a second argument"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>a</th><th>b</th></tr><tr><th></th><th>Int64</th><th>Float64</th></tr></thead><tbody><p>0 rows × 2 columns</p></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cc}\n",
       "\t& a & b\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Float64\\\\\n",
       "\t\\hline\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m0×2 DataFrame\u001b[0m"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "similar(x, 0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>a</th><th>b</th></tr><tr><th></th><th>Int64</th><th>Float64</th></tr></thead><tbody><p>2 rows × 2 columns</p><tr><th>1</th><td>-1</td><td>0.0</td></tr><tr><th>2</th><td>1</td><td>1.96493e-315</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cc}\n",
       "\t& a & b\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Float64\\\\\n",
       "\t\\hline\n",
       "\t1 & -1 & 0.0 \\\\\n",
       "\t2 & 1 & 1.96493e-315 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m2×2 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m a     \u001b[0m\u001b[1m b            \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Float64      \u001b[0m\n",
       "─────┼─────────────────────\n",
       "   1 │    -1  0.0\n",
       "   2 │     1  1.96493e-315"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "similar(x, 2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also create a new `DataFrame` from `SubDataFrame` or `DataFrameRow` (discussed in detail later in the tutorial; in particular although `DataFrameRow` is considered a 1-dimensional object similar to a `NamedTuple` it gets converted to a 1-row `DataFrame` for convinience)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>a</th><th>b</th></tr><tr><th></th><th>Int64</th><th>Float64</th></tr></thead><tbody><p>2 rows × 2 columns</p><tr><th>1</th><td>1</td><td>1.0</td></tr><tr><th>2</th><td>1</td><td>1.0</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cc}\n",
       "\t& a & b\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Float64\\\\\n",
       "\t\\hline\n",
       "\t1 & 1 & 1.0 \\\\\n",
       "\t2 & 1 & 1.0 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m2×2 SubDataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m a     \u001b[0m\u001b[1m b       \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\n",
       "─────┼────────────────\n",
       "   1 │     1      1.0\n",
       "   2 │     1      1.0"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sdf = view(x, [1,1], :)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "SubDataFrame{DataFrame,DataFrames.Index,Array{Int64,1}}"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "typeof(sdf)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>a</th><th>b</th></tr><tr><th></th><th>Int64</th><th>Float64</th></tr></thead><tbody><p>2 rows × 2 columns</p><tr><th>1</th><td>1</td><td>1.0</td></tr><tr><th>2</th><td>1</td><td>1.0</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cc}\n",
       "\t& a & b\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Float64\\\\\n",
       "\t\\hline\n",
       "\t1 & 1 & 1.0 \\\\\n",
       "\t2 & 1 & 1.0 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m2×2 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m a     \u001b[0m\u001b[1m b       \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\n",
       "─────┼────────────────\n",
       "   1 │     1      1.0\n",
       "   2 │     1      1.0"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "DataFrame(sdf)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<p>DataFrameRow (2 columns)</p><table class=\"data-frame\"><thead><tr><th></th><th>a</th><th>b</th></tr><tr><th></th><th>Int64</th><th>Float64</th></tr></thead><tbody><tr><th>1</th><td>1</td><td>1.0</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cc}\n",
       "\t& a & b\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Float64\\\\\n",
       "\t\\hline\n",
       "\t1 & 1 & 1.0 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1mDataFrameRow\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m a     \u001b[0m\u001b[1m b       \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\n",
       "─────┼────────────────\n",
       "   1 │     1      1.0"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dfr = x[1, :]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>a</th><th>b</th></tr><tr><th></th><th>Int64</th><th>Float64</th></tr></thead><tbody><p>1 rows × 2 columns</p><tr><th>1</th><td>1</td><td>1.0</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cc}\n",
       "\t& a & b\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Float64\\\\\n",
       "\t\\hline\n",
       "\t1 & 1 & 1.0 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m1×2 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m a     \u001b[0m\u001b[1m b       \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Float64 \u001b[0m\n",
       "─────┼────────────────\n",
       "   1 │     1      1.0"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "DataFrame(dfr)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Conversion to a matrix\n",
    "\n",
    "Let's start by creating a `DataFrame` with two rows and two columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>x</th><th>y</th></tr><tr><th></th><th>Int64</th><th>String</th></tr></thead><tbody><p>2 rows × 2 columns</p><tr><th>1</th><td>1</td><td>A</td></tr><tr><th>2</th><td>2</td><td>B</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cc}\n",
       "\t& x & y\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & String\\\\\n",
       "\t\\hline\n",
       "\t1 & 1 & A \\\\\n",
       "\t2 & 2 & B \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m2×2 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m x     \u001b[0m\u001b[1m y      \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m String \u001b[0m\n",
       "─────┼───────────────\n",
       "   1 │     1  A\n",
       "   2 │     2  B"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = DataFrame(x=1:2, y=[\"A\", \"B\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can create a matrix by passing this `DataFrame` to `Matrix` or `Array`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2×2 Array{Any,2}:\n",
       " 1  \"A\"\n",
       " 2  \"B\""
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Matrix(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2×2 Array{Any,2}:\n",
       " 1  \"A\"\n",
       " 2  \"B\""
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Array(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This would work even if the `DataFrame` had some `missing`s:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>x</th><th>y</th></tr><tr><th></th><th>Int64</th><th>String?</th></tr></thead><tbody><p>2 rows × 2 columns</p><tr><th>1</th><td>1</td><td><em>missing</em></td></tr><tr><th>2</th><td>2</td><td>B</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cc}\n",
       "\t& x & y\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & String?\\\\\n",
       "\t\\hline\n",
       "\t1 & 1 & \\emph{missing} \\\\\n",
       "\t2 & 2 & B \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m2×2 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m x     \u001b[0m\u001b[1m y       \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m String? \u001b[0m\n",
       "─────┼────────────────\n",
       "   1 │     1 \u001b[90m missing \u001b[0m\n",
       "   2 │     2  B"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = DataFrame(x=1:2, y=[missing,\"B\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2×2 Array{Any,2}:\n",
       " 1  missing\n",
       " 2  \"B\""
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Matrix(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the two previous matrix examples, Julia created matrices with elements of type `Any`. We can see more clearly that the type of matrix is inferred when we pass, for example, a `DataFrame` of integers to `Matrix`, creating a 2D `Array` of `Int64`s:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>x</th><th>y</th></tr><tr><th></th><th>Int64</th><th>Int64</th></tr></thead><tbody><p>2 rows × 2 columns</p><tr><th>1</th><td>1</td><td>3</td></tr><tr><th>2</th><td>2</td><td>4</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cc}\n",
       "\t& x & y\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Int64\\\\\n",
       "\t\\hline\n",
       "\t1 & 1 & 3 \\\\\n",
       "\t2 & 2 & 4 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m2×2 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m x     \u001b[0m\u001b[1m y     \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\n",
       "─────┼──────────────\n",
       "   1 │     1      3\n",
       "   2 │     2      4"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = DataFrame(x=1:2, y=3:4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2×2 Array{Int64,2}:\n",
       " 1  3\n",
       " 2  4"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Matrix(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this next example, Julia correctly identifies that `Union` is needed to express the type of the resulting `Matrix` (which contains `missing`s)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>x</th><th>y</th></tr><tr><th></th><th>Int64</th><th>Int64?</th></tr></thead><tbody><p>2 rows × 2 columns</p><tr><th>1</th><td>1</td><td><em>missing</em></td></tr><tr><th>2</th><td>2</td><td>4</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cc}\n",
       "\t& x & y\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Int64?\\\\\n",
       "\t\\hline\n",
       "\t1 & 1 & \\emph{missing} \\\\\n",
       "\t2 & 2 & 4 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m2×2 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m x     \u001b[0m\u001b[1m y       \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Int64?  \u001b[0m\n",
       "─────┼────────────────\n",
       "   1 │     1 \u001b[90m missing \u001b[0m\n",
       "   2 │     2        4"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = DataFrame(x=1:2, y=[missing,4])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2×2 Array{Union{Missing, Int64},2}:\n",
       " 1   missing\n",
       " 2  4"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Matrix(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that we can't force a conversion of `missing` values to `Int`s!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "ename": "LoadError",
     "evalue": "ArgumentError: cannot convert a DataFrame containing missing values to Matrix{Int64} (found for column y)",
     "output_type": "error",
     "traceback": [
      "ArgumentError: cannot convert a DataFrame containing missing values to Matrix{Int64} (found for column y)",
      "",
      "Stacktrace:",
      " [1] convert(::Type{Array{Int64,2}}, ::DataFrame) at D:\\AppData\\.julia\\packages\\DataFrames\\X0xNW\\src\\abstractdataframe\\abstractdataframe.jl:1156",
      " [2] Array{Int64,2}(::DataFrame) at D:\\AppData\\.julia\\packages\\DataFrames\\X0xNW\\src\\abstractdataframe\\abstractdataframe.jl:1168",
      " [3] top-level scope at In[41]:1",
      " [4] include_string(::Function, ::Module, ::String, ::String) at .\\loading.jl:1091"
     ]
    }
   ],
   "source": [
    "Matrix{Int}(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Conversion to `NamedTuple` related tabular structures"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First define some data frame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>x</th><th>y</th></tr><tr><th></th><th>Int64</th><th>String</th></tr></thead><tbody><p>2 rows × 2 columns</p><tr><th>1</th><td>1</td><td>A</td></tr><tr><th>2</th><td>2</td><td>B</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cc}\n",
       "\t& x & y\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & String\\\\\n",
       "\t\\hline\n",
       "\t1 & 1 & A \\\\\n",
       "\t2 & 2 & B \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m2×2 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m x     \u001b[0m\u001b[1m y      \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m String \u001b[0m\n",
       "─────┼───────────────\n",
       "   1 │     1  A\n",
       "   2 │     2  B"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = DataFrame(x=1:2, y=[\"A\", \"B\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we convert a `DataFrame` into a `NamedTuple` of vectors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(x = [1, 2], y = [\"A\", \"B\"])"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ct = Tables.columntable(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next we convert it into a vector of `NamedTuples`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2-element Array{NamedTuple{(:x, :y),Tuple{Int64,String}},1}:\n",
       " (x = 1, y = \"A\")\n",
       " (x = 2, y = \"B\")"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rt = Tables.rowtable(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can perform the conversions back to a `DataFrame` using a standard constructor call:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>x</th><th>y</th></tr><tr><th></th><th>Int64</th><th>String</th></tr></thead><tbody><p>2 rows × 2 columns</p><tr><th>1</th><td>1</td><td>A</td></tr><tr><th>2</th><td>2</td><td>B</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cc}\n",
       "\t& x & y\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & String\\\\\n",
       "\t\\hline\n",
       "\t1 & 1 & A \\\\\n",
       "\t2 & 2 & B \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m2×2 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m x     \u001b[0m\u001b[1m y      \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m String \u001b[0m\n",
       "─────┼───────────────\n",
       "   1 │     1  A\n",
       "   2 │     2  B"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "DataFrame(ct)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>x</th><th>y</th></tr><tr><th></th><th>Int64</th><th>String</th></tr></thead><tbody><p>2 rows × 2 columns</p><tr><th>1</th><td>1</td><td>A</td></tr><tr><th>2</th><td>2</td><td>B</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cc}\n",
       "\t& x & y\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & String\\\\\n",
       "\t\\hline\n",
       "\t1 & 1 & A \\\\\n",
       "\t2 & 2 & B \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m2×2 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m x     \u001b[0m\u001b[1m y      \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m String \u001b[0m\n",
       "─────┼───────────────\n",
       "   1 │     1  A\n",
       "   2 │     2  B"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "DataFrame(rt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Iterating data frame by rows or columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Sometimes it is useful to create a wrapper around a `DataFrame` that produces its rows or columns."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For iterating columns you can use the `eachcol` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<p>2×2 DataFrameColumns</p><table class=\"data-frame\"><thead><tr><th></th><th>x</th><th>y</th></tr><tr><th></th><th>Int64</th><th>String</th></tr></thead><tbody><tr><th>1</th><td>1</td><td>A</td></tr><tr><th>2</th><td>2</td><td>B</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cc}\n",
       "\t& x & y\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & String\\\\\n",
       "\t\\hline\n",
       "\t1 & 1 & A \\\\\n",
       "\t2 & 2 & B \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m2×2 DataFrameColumns\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m x     \u001b[0m\u001b[1m y      \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m String \u001b[0m\n",
       "─────┼───────────────\n",
       "   1 │     1  A\n",
       "   2 │     2  B"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ec = eachcol(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`DataFrameColumns` object behaves as a vector (note though it is not `AbstractVector`)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "false"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ec isa AbstractVector"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2-element Array{Int64,1}:\n",
       " 1\n",
       " 2"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ec[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "but you can also index into it using column names:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2-element Array{Int64,1}:\n",
       " 1\n",
       " 2"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ec[\"x\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "similarly `eachrow` creates a `DataFrameRows` object that is a vector of its rows"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<p>2×2 DataFrameRows</p><table class=\"data-frame\"><thead><tr><th></th><th>x</th><th>y</th></tr><tr><th></th><th>Int64</th><th>String</th></tr></thead><tbody><tr><th>1</th><td>1</td><td>A</td></tr><tr><th>2</th><td>2</td><td>B</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cc}\n",
       "\t& x & y\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & String\\\\\n",
       "\t\\hline\n",
       "\t1 & 1 & A \\\\\n",
       "\t2 & 2 & B \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m2×2 DataFrameRows\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m x     \u001b[0m\u001b[1m y      \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m String \u001b[0m\n",
       "─────┼───────────────\n",
       "   1 │     1  A\n",
       "   2 │     2  B"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "er = eachrow(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`DataFrameRows` is an `AbstractVector`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "true"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "er isa AbstractVector"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<p>DataFrameRow (2 columns)</p><table class=\"data-frame\"><thead><tr><th></th><th>x</th><th>y</th></tr><tr><th></th><th>Int64</th><th>String</th></tr></thead><tbody><tr><th>2</th><td>2</td><td>B</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cc}\n",
       "\t& x & y\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & String\\\\\n",
       "\t\\hline\n",
       "\t2 & 2 & B \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1mDataFrameRow\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m x     \u001b[0m\u001b[1m y      \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m String \u001b[0m\n",
       "─────┼───────────────\n",
       "   2 │     2  B"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "er[end]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that both data frame and also `DataFrameColumns` and `DataFrameRows` objects are not type stable (they do not know the types of their columns). This is useful to avoid compilation cost if you have very wide data frames with heterogenous column types."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "However, often (especially if a data frame is narrows) it is useful to create a lazy iterator that produces `NamedTuple`s for each row of the `DataFrame`. Its key benefit is that it is type stable (so it is useful when you want to perform some operations in a fast way on a small subset of columns of a `DataFrame` - this strategy is often used internally by DataFrames.jl package):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Tables.NamedTupleIterator{Tables.Schema{(:x, :y),Tuple{Int64,String}},Tables.RowIterator{NamedTuple{(:x, :y),Tuple{Array{Int64,1},Array{String,1}}}}}(Tables.RowIterator{NamedTuple{(:x, :y),Tuple{Array{Int64,1},Array{String,1}}}}((x = [1, 2], y = [\"A\", \"B\"]), 2))"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nti = Tables.namedtupleiterator(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "row = (1, (x = 1, y = \"A\"))\n",
      "row = (2, (x = 2, y = \"B\"))\n"
     ]
    }
   ],
   "source": [
    "for row in enumerate(nti)\n",
    "    @show row\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "similarly to the previous options you can easily convert `NamedTupleIterator` back to a `DataFrame`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>x</th><th>y</th></tr><tr><th></th><th>Int64</th><th>String</th></tr></thead><tbody><p>2 rows × 2 columns</p><tr><th>1</th><td>1</td><td>A</td></tr><tr><th>2</th><td>2</td><td>B</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|cc}\n",
       "\t& x & y\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & String\\\\\n",
       "\t\\hline\n",
       "\t1 & 1 & A \\\\\n",
       "\t2 & 2 & B \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m2×2 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m x     \u001b[0m\u001b[1m y      \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m String \u001b[0m\n",
       "─────┼───────────────\n",
       "   1 │     1  A\n",
       "   2 │     2  B"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "DataFrame(nti)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Handling of duplicate column names\n",
    "\n",
    "We can pass the `makeunique` keyword argument to allow passing duplicate names (they get deduplicated)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>a</th><th>a_2</th><th>a_1</th></tr><tr><th></th><th>Int64</th><th>Int64</th><th>Int64</th></tr></thead><tbody><p>1 rows × 3 columns</p><tr><th>1</th><td>1</td><td>2</td><td>3</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccc}\n",
       "\t& a & a\\_2 & a\\_1\\\\\n",
       "\t\\hline\n",
       "\t& Int64 & Int64 & Int64\\\\\n",
       "\t\\hline\n",
       "\t1 & 1 & 2 & 3 \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m1×3 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m a     \u001b[0m\u001b[1m a_2   \u001b[0m\u001b[1m a_1   \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\u001b[90m Int64 \u001b[0m\n",
       "─────┼─────────────────────\n",
       "   1 │     1      2      3"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = DataFrame(:a=>1, :a=>2, :a_1=>3; makeunique=true)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Otherwise, duplicates are not allowed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "ename": "LoadError",
     "evalue": "ArgumentError: Duplicate variable names: :a. Pass makeunique=true to make them unique using a suffix automatically.",
     "output_type": "error",
     "traceback": [
      "ArgumentError: Duplicate variable names: :a. Pass makeunique=true to make them unique using a suffix automatically.",
      "",
      "Stacktrace:",
      " [1] make_unique!(::Array{Symbol,1}, ::Array{Symbol,1}; makeunique::Bool) at D:\\AppData\\.julia\\packages\\DataFrames\\X0xNW\\src\\other\\utils.jl:35",
      " [2] #make_unique#2 at D:\\AppData\\.julia\\packages\\DataFrames\\X0xNW\\src\\other\\utils.jl:57 [inlined]",
      " [3] #Index#3 at D:\\AppData\\.julia\\packages\\DataFrames\\X0xNW\\src\\other\\index.jl:27 [inlined]",
      " [4] DataFrame(::Pair{Symbol,Int64}, ::Vararg{Pair{Symbol,Int64},N} where N; makeunique::Bool, copycols::Bool) at D:\\AppData\\.julia\\packages\\DataFrames\\X0xNW\\src\\dataframe\\dataframe.jl:218",
      " [5] DataFrame(::Pair{Symbol,Int64}, ::Vararg{Pair{Symbol,Int64},N} where N) at D:\\AppData\\.julia\\packages\\DataFrames\\X0xNW\\src\\dataframe\\dataframe.jl:216",
      " [6] top-level scope at In[58]:1",
      " [7] include_string(::Function, ::Module, ::String, ::String) at .\\loading.jl:1091"
     ]
    }
   ],
   "source": [
    "df = DataFrame(:a=>1, :a=>2, :a_1=>3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Observe that currently `nothing` is not printed when displaying a `DataFrame` in Jupyter Notebook:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>x</th><th>y</th><th>z</th></tr><tr><th></th><th>Union…</th><th>Union…</th><th>String?</th></tr></thead><tbody><p>2 rows × 3 columns</p><tr><th>1</th><td>1</td><td></td><td><em>missing</em></td></tr><tr><th>2</th><td></td><td>a</td><td>c</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccc}\n",
       "\t& x & y & z\\\\\n",
       "\t\\hline\n",
       "\t& Union… & Union… & String?\\\\\n",
       "\t\\hline\n",
       "\t1 & 1 &  & \\emph{missing} \\\\\n",
       "\t2 &  & a & c \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m2×3 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m x      \u001b[0m\u001b[1m y      \u001b[0m\u001b[1m z       \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Union… \u001b[0m\u001b[90m Union… \u001b[0m\u001b[90m String? \u001b[0m\n",
       "─────┼─────────────────────────\n",
       "   1 │ 1      \u001b[90m        \u001b[0m\u001b[90m missing \u001b[0m\n",
       "   2 │\u001b[90m        \u001b[0m a       c"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = DataFrame(x=[1, nothing], y=[nothing, \"a\"], z=[missing, \"c\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally you can use `empty` and `empty!` functions to remove all rows from a data frame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>x</th><th>y</th><th>z</th></tr><tr><th></th><th>Union…</th><th>Union…</th><th>String?</th></tr></thead><tbody><p>0 rows × 3 columns</p></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccc}\n",
       "\t& x & y & z\\\\\n",
       "\t\\hline\n",
       "\t& Union… & Union… & String?\\\\\n",
       "\t\\hline\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m0×3 DataFrame\u001b[0m"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "empty(df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>x</th><th>y</th><th>z</th></tr><tr><th></th><th>Union…</th><th>Union…</th><th>String?</th></tr></thead><tbody><p>2 rows × 3 columns</p><tr><th>1</th><td>1</td><td></td><td><em>missing</em></td></tr><tr><th>2</th><td></td><td>a</td><td>c</td></tr></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccc}\n",
       "\t& x & y & z\\\\\n",
       "\t\\hline\n",
       "\t& Union… & Union… & String?\\\\\n",
       "\t\\hline\n",
       "\t1 & 1 &  & \\emph{missing} \\\\\n",
       "\t2 &  & a & c \\\\\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m2×3 DataFrame\u001b[0m\n",
       "\u001b[1m Row \u001b[0m│\u001b[1m x      \u001b[0m\u001b[1m y      \u001b[0m\u001b[1m z       \u001b[0m\n",
       "\u001b[1m     \u001b[0m│\u001b[90m Union… \u001b[0m\u001b[90m Union… \u001b[0m\u001b[90m String? \u001b[0m\n",
       "─────┼─────────────────────────\n",
       "   1 │ 1      \u001b[90m        \u001b[0m\u001b[90m missing \u001b[0m\n",
       "   2 │\u001b[90m        \u001b[0m a       c"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>x</th><th>y</th><th>z</th></tr><tr><th></th><th>Union…</th><th>Union…</th><th>String?</th></tr></thead><tbody><p>0 rows × 3 columns</p></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccc}\n",
       "\t& x & y & z\\\\\n",
       "\t\\hline\n",
       "\t& Union… & Union… & String?\\\\\n",
       "\t\\hline\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m0×3 DataFrame\u001b[0m"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "empty!(df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><thead><tr><th></th><th>x</th><th>y</th><th>z</th></tr><tr><th></th><th>Union…</th><th>Union…</th><th>String?</th></tr></thead><tbody><p>0 rows × 3 columns</p></tbody></table>"
      ],
      "text/latex": [
       "\\begin{tabular}{r|ccc}\n",
       "\t& x & y & z\\\\\n",
       "\t\\hline\n",
       "\t& Union… & Union… & String?\\\\\n",
       "\t\\hline\n",
       "\\end{tabular}\n"
      ],
      "text/plain": [
       "\u001b[1m0×3 DataFrame\u001b[0m"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  }
 ],
 "metadata": {
  "@webio": {
   "lastCommId": null,
   "lastKernelId": null
  },
  "kernelspec": {
   "display_name": "Julia 1.5.3",
   "language": "julia",
   "name": "julia-1.5"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.5.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
